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Abstract 

Quantum computing promises exponential speed-ups for 
important simulation and optimization problems. It also 
poses new CAD problems that are similar to, but more 
challenging, than the related problems in classical (non- 
quantum) CAD, such as determining if two states or circuits 
are functionally equivalent. While differences in classical 
states are easy to detect, quantum states, which are rep- 
resented by complex-valued vectors, exhibit subtle differ- 
ences leading to several notions of equivalence. This pro- 
vides flexibility in optimizing quantum circuits, but leads 
to difficult new equivalence-checking issues for simulation 
and synthesis. We identify several different equivalence- 
checking problems and present algorithms for practical 
benchmarks, including quantum communication and search 
circuits, which are shown to be very fast and robust for hun- 
dreds of qubits. 

1 Introduction 

Quantum computing (QC) is a recently discovered alter- 
native to conventional computer technology that offers not 
only miniaturization, but massive performance speed-ups 
for certain tasks 1 1 21 1191 II II and new levels of protection 
in secure communications JUfS). Information is stored in 
particle states and processed using quantum-mechanical op- 
erations referred to as quantum gates. The analogue of the 
classical bit, qubit, has two basic states denoted |0) and 
1 1), but can also exist in a superposition of these two states 
|(|>) = 0C|0) + P| 1), where |oc| 2 + |fi| 2 = 1. A composite sys- 
tem consisting of n such qubits requires 2" parameters (am- 
plitudes) indexed by n-bit binary numbers \<Sf) = E^jOtj 
where Z|a,| 2 = 1. Quantum gates transform such states by 
applying unitary matrices to them. Measurement of a quan- 
tum state produces classical bits with probabilities depen- 
dent on a,-. Combining several gates, as in FigurefT) yields 
quantum circuits [141 that compactly describe more sophis- 
ticated transformations that play the role of quantum algo- 
rithms. 

Based on the success of CAD for classical logic circuits, 
new algorithms have been proposed for synthesis and simu- 
lation of quantum circuits lEfl [T7ll2Tfl[T0l [T1 |2l1|25l . In par- 
ticular, the DAC 2007 paper 1131 , describes what amounts 
to placement and physical synthesis for quantum circuits — 
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"adapting the circuit to particulars of the physical environ- 
ment which restricts/complicates the establishment of cer- 
tain direct interactions between qubits." Another example 
is given in 1171 Section 6]Q Traditionally, such transfor- 
mations must be verified by equivalence-checking, but the 
quantum context is more difficult because qubits and quan- 
tum gates may differ by global and relative phase (defined 
below), yet be equivalent upon measurement 1141 . To this 
end, our work is the first to develop techniques for quantum 
phase-equivalence checking. 

Two quantum states |v|/) and |(p) are equivalent up to 
global phase if |(p) = e' e |v|/), where 6 e R. The phase e' e 
will not be observed upon measurement of either state 1141 . 
By contrast, two states are equal up to relative phase if a 
unitary diagonal matrix can transform one into the other: 

|cp) = diag(AA •••,«"*-') M- (1) 

The probability amplitudes of the state t/|\|f) will in gen- 
eral differ by more than relative phase from those of 
l/|(p), but the measurement outcomes may be equivalent. 
One can consider a hierarchy in which exact equivalence 
implies global-phase equivalence, which implies relative- 
phase equivalence, which in turn implies measurement out- 
come equivalence. The equivalence checking problem is 
also extensible to quantum operators with applications to 
quantum-circuit synthesis and verification, which involves 
computer-aided generation of minimal quantum circuits 
with correct functionality. Extended notions of equivalence 
create several design opportunities. For example, the well- 
known three-qubit Toffoli gate can be implemented with 
fewer controlled-NOT (CNOT) and 1 -qubit gates up to rel- 
ative phase (31 1201 as shown in FigurefT] The relative-phase 
differences can be canceled out if every pair of these gates 
in the circuit is strategically placed 1201 . Since circuit min- 
imization is being pursued for a number of key quantum 
arithmetic circuits with many Toffoli gates, such as modu- 
lar exponentiation 1221 [9] [T8] 1171 . this optimization could 
reduce the number of gates even further. 

The inner product and matrix product may be used to de- 
termine such equivalences, but in this work, we present new 
decision-diagram (DD) algorithms to accomplish the task 
more efficiently. In particular, we make use of the quantum 

1 For example, in a spin chain architecture the qubits are laid out in a line, 
and all CNOT gates must act only on adjacent (nearest-neighbor) qubits. The 
work in 1 17| shows that such a restriction can be accomodated by restructur- 
ing an existing circuit in such a way that worst-case circuit sizes grow by no 
more than nine times. 
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Figure 1 : Margolus' circuit is equivalent up to relative phase 
to the Toffoli gate, which otherwise requires six CNOT and 
eight 1-qubit gates to implement 1 161 . 

information decision diagram (QuIDD) 1241 23), a datas- 
tructure with unique properties that are exploited to solve 
this problem asymptotically faster in practical cases. 

Empirical results confirm the algorithms' effectiveness 
and show that the improvements are more significant for 
the operators than for the states. Interestingly, solving the 
equivalence problems for the benchmarks considered re- 
quires significantly less time than creating the DD represen- 
tations, which indicates that such problems can be reason- 
ably solved in practice using quantum-circuit CAD tools. 

The structure of this work is as follows. Section [2] 
provides a review of the QuIDD datastructure. Section [3] 
describes both linear-algebraic and QuIDD algorithms for 
checking global-phase equivalence of states and operators. 
Section [4] covers relative-phase equivalence checking algo- 
rithms. Sections[3]and|4]also contain empirical studies com- 
paring the algorithms' performance on various benchmarks. 
Lastly, conclusions and a summary of computational com- 
plexity results for all algorithms are provided in Section[5] 

2 Background 

The QuIDD is a variant of the reduced ordered binary deci- 
sion diagram (ROBDD or BDD) datastructure |7) applied to 
quantum circuit simulation 1241 23]. Like other DD variants, 
it has all of the key properties of BDDs as well as a few other 
application-specific attributes (see Figure|2]for examples). 

• It is a directed acyclic graph with internal nodes whose 
edges represent assignments to binary variables 

• The leaf or terminal nodes contain complex values 

• Each path from the root to a terminal node is a func- 
tional mapping of row and column indices to complex- 
valued matrix elements (/ : {0, 1}" — > C) 

• Nodes are unique and shared, meaning that any nodes 
v and v' with isomorphic subgraphs do not exist 

• Variables whose values do not affect the function out- 
put for a particular path (not in the support) are absent 

• Binary row (R,) and column (C,) index variables have 
evaluation order Rq -< Co -< . . -R n -\ -< C„_i 

The algorithms which manipulate DDs are just as impor- 
tant as the properties of the DDs. In particular, the Apply 
algorithm (see Figure [3) performs recursive traversals on 
DD operands to build new DDs using any desired unary or 
binary function |7], Although originally intended for dig- 
ital logic operations, Apply has been extended to linear- 
algebraic operations such as matrix addition and multiplica- 
tion (2] [§), as well as quantum-mechanical operations such 



Apply(A,B,bjop) { 
it (Is -Constant (A) and Is jConstant(B)) { 
return New-Terminal(b-op(Value(A), 
Value(B))); 

} 

if (TableM)okup(R,b_op,A,B)) return R; 

v = Top.Var(A,B); 

T = Apply(A v ,B v ,b.op); 

E = Apply (A v ,,B v ,,b.op); 

R = ITE(v,T,E); 

TableJnsert(R,b-op,A,B); 

return R; 

} 



Figure 3: The Apply algorithm. TopJ/ar returns the 
smaller variable index from A or B, while ITE creates a 
new internal node with children T and E. 

as measurement and partial trace 1241 1231 . The runtime and 
memory complexity of Apply is (3(|A||S|), where \A\ and 
\B\ are the sizes in number of internal and terminal nodes of 
the DDs A and B, respectively |7 |0 Thus, the complexity 
of DD-based algorithms is tied to the compression achieved 
by the datastructure. These complexity bounds are impor- 
tant for analyzing many of the algorithms presented in this 
work. 

Another important aspect of Apply is that it utilizes a 
cache of internal nodes and binary operators (Table_Lookup 
and Table Jnsert) to ensure that the new DD being cre- 
ated obeys the DD uniqueness properties. Maintaining these 
properties makes many DDs such as QuIDDs canonical, 
meaning that two different DDs do not implement the same 
function. Thus, exact equivalence checking is trivial with 
canonical DDs and may be performed in 0(1) time by com- 
paring the root nodes, a technique which has been long ex- 
ploited in the classical domain 1211 . Quantum state and op- 
erator equivalence is less trivial as we show. 

3 Checking Equivalence up to Global Phase 

This section describes algorithms that check global-phase 
equivalence of two quantum states or operators. The first 
two algorithms are known QuIDD-based linear-algebraic 
operations, while the remaining algorithms are the new ones 
that exploit DD properties explicitly. The section concludes 
with experiments comparing all algorithms. 

3.1 Inner Product Check 

Since the quantum-circuit formalism models an arbitrary 
quantum state as a unit vector, then the inner product 
(\|/ 1 \|/) = 1 . In the case of a global-phase difference between 
two states |\|/) and |(p), the inner product is the global-phase 
factor, (<p | \f) = e' e {\|/ 1 \|/) = e' e . Since \e' e \ = 1 for any 6, 
checking if the complex modulus of the inner product is 1 
suffices to check global-phase equivalence for states. 

Although the inner product may be computed using ex- 
plicit arrays, a QuIDD-based implementation is easily de- 
rived. The complex-conjugate transpose and matrix product 
with QuIDD operands have been previously defined 1241 . 



2 The runtime and memory complexity of the unary version acting on one 
DDAisO(|A|)(7J. 




Figure 2: Sample QuIDDs of (a) a 2-qubit equal superposition with relative phases and (b) the CNOT operator. Each internal 
node (circle) is unique and depends on a variable listed to the left (dashed (solid) edge is (1) assignment). Internal node labels 
are unique hexadecimal identifiers based on each node's memory address. Terminal nodes (squares) contain complex values. 



Thus, the algorithm computes the complex-conjugate trans- 
pose of A and multiplies the result with B. The complexity 
of this algorithm is given by the following lemma. 

Lemma 1 Consider state QuIDDs A and B with sizes \A\ 
and \B\, respectively, in nodes. Computing the global-phase 
difference via the inner product uses 0{\A\\B\) time and 
memory. 

Proof. Computing the complex-conjugate transpose of A 
requires 0(|A|) time and memory since it is a unary call to 
Apply l24l . Matrix multiplication of two ADDs of sizes |A| 
and requires 0((|A||B|) 2 ) time and memory (2). How- 
ever, this bound is loose for an inner product because only a 
single dot product must be performed. In this case, the ADD 
matrix multiplication algorithm reduces to a single call of 
C = Apply(A,B, *) followed by D = Apply(C, +) (2]. D is 
a single terminal node containing the global-phase factor if 
\value(D)\ = 1. Apply(A,B,*) and Apply(C,+) are com- 
puted in 0(|A||B|) time and memory (7J, while \value(D)\ 
is computed in 0(1) time and memory. □ 

3.2 Matrix Product 

The matrix product of two operators can be used for global- 
phase equivalence checking. In particular, since all quantum 
operators are unitary, the adjoint of each operator is its in- 
verse. Thus, if two operators U and V differ by a global 
phase, then UV 1 = e' 6 I. 

With QuIDDs for U and V, computing v' requires 
0( | V | ) time and memory 1241 . Computing W = UV^ re- 
quires 0{{\U\\V\) 2 ) time and memory (2). To check if 
W = e' 9 /, any terminal value t is chosen from W, and scalar 
division is performed as W' = Apply (W,r, /), which takes 
0{(\U\\V\) 2 ) time and memory. Canonicity ensures that 
checking if W' = I requires only 0(1) time and memory. 
If W' = I, then t is the global-phase factor. 



3.3 Node-Count Check 

The previous algorithms merely translate linear-algebraic 
operations to QuIDDs, but exploiting the following QuIDD 
property leads to faster checks. 

Lemma 2 The QuIDD A' = Apply(A, c, *), where c € C 
and c 7^ 0, is isomorphic to A, hence \A'\ = \A\. 

Proof. In creating A', Apply expands all of the internal 
nodes of A since c is a scalar, and the new terminals are the 
terminals of A multiplied by c. All terminal values f; of A 
are unique by definition of a QuIDD 1241 . Thus, ctj ^ ctj for 
all i, j such that i ^ j. As a result, the number of terminals 
in A' is the same as in A. □ 
Lemma[2j states that two QuIDD states or operators that 
differ by a non-zero scalar, such as a global-phase factor, 
have the same number of nodes. Thus, equal node counts 
in QuIDDs are a necessary but not sufficient condition for 
global-phase equivalence. To see why it is not sufficient, 
consider two state vectors |v|/) and |(p) with elements Wj 
and V£, respectively, where j,k = 0, 1,...N — 1. If some 
wj = v k = such that j ^ k, then |q>) ^ e m |\|/). The QuIDD 
representations of these states can in general have the same 
node counts. Despite this drawback, the node-count check 
requires only 0(1) time since Apply is easily augmented to 
recursively sum the number of nodes as a QuIDD is created. 

3.4 Recursive Check 

Lemma[2]implies that a QuIDD-based algorithm can imple- 
ment a sufficient condition for global-phase equivalence by 
accounting for terminal value differences. The pseudo code 
for such an algorithm (GPRC) is presented in Figure [4] 

GPRC returns true if two QuIDDs A and B differ by 
global phase and false otherwise, gp and have.gp are global 
variables containing the global-phase factor and a flag sig- 
nifying whether or not a terminal node has been reached, 
respectively, gp is defined only if true is returned. 

The first conditional block of GPRC deals with terminal 
values. The potential global-phase factor ngp is computed 



GPRC(A,B,gp,have-gp) { 

if (Is.Constant(A) and Is.Constant(B)) { 

if (Value(B) == 0) return {Value{A) == 0); 
ngp = Value(A)/Value(B); 
if (sqrt(reai(/7g/?) * real(»gp) + 
imag(«gp)*imag(ng/))) ! = 1) 
return false; 
if (lhave-gp) { 
gp = ngp; 
have-gp — true; 

} 

return (ngp==gp); 

} 

if ((IsJConstant (A) and \Is.Constant(B)) 
or {UsJConstant{A) and I Const ant (B))) 
return false; 
if (Var(A) ! = Var(B)) return false; 
return (GVRC(Tken{A),Then(B),gp,have-gp) 
and GPRC(£/se(A) , £Zse(fl) , gp, have_gp) ) ; 

} 



Figure 4: Recursive global-phase equivalence check. 
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Figure 5: One iteration of Grover's search algorithm 
with an ancillary qubit used by the oracle. CPS is the 
conditional phase shift operator, while the boxed por- 
tion is the Grover iteration operator. 

after handling division by 0. If \ngp\ ^ 1 or if ngp ^= gp 
when gp has been set,then the two QuIDDs do not differ by 
a global phase. Next, the condition specified by Lemma [2] 
is addressed. If the node of A depends on a different row 
or column variable than the node of B, then A and B are not 
isomorphic and thus cannot differ by global phase. Finally, 
GPRC is called recursively, and the results of these calls are 
combined via the logical AND operation. 

Early termination occurs when isomorphism is violated 
or more than one phase difference is computed. In the 
worst case, both QuIDDs are isomorphic and all nodes are 
visisted, but the last terminal visited in each QuIDD will 
not be equal up to global phase. Thus, the overall runtime 
and memory complexity of GPRC for states or operators is 
0(\A\ + \B\). Also, the node-count check can be run before 
GPRC to quickly eliminate many nonequivalences. 

3.5 Empirical Results for Global-Phase 
Equivalence Algorithms 

The first benchmark considered is a single iteration of 
Grover's quantum search algorithm 111 II . which is depicted 
in Figure [3] The oracle searches for the last item in the 
database 1241 . One iteration is sufficient to test the effective- 
ness of the algorithms since the state vector QuIDD remains 
isomorphic across all iterations 1241 . 
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Figure 8: A QuIDD state combining x and l x mod\5 in bi- 
nary. The first qubit of each partition is least-significant. 

Figure^ shows the runtime results for the inner product 
and GPRC algorithms (no results are given for the node- 
count check algorithm since it runs in 0(1) time). The 
results confirm the asymptotic complexity differences be- 
tween the algorithms. The number of nodes in the QuIDD 
state vector after a Grover iteration is 0(n) 1241 , which is 
confirmed in Figure^. As a result, the runtime complexity 
of the inner product should be 0(n ), which is confirmed by 
a regression plot within 1% error. By contrast, the runtime 
complexity of the GPRC algorithm should be 0(n), which 
is also confirmed by another regression plot within 1 % error. 

Figure |7Ji shows runtime results for the matrix product 
and GPRC algorithms checking the Grover operator. Like 
the state vector, it has been shown that the QuIDD for this 
operator grows in size as 0(n) 1241 . which is confirmed in 
Figure |7p. Therefore, the runtime of the matrix product 
should be quadratic in n but linear in n for GPRC. Regres- 
sion plots verify these complexities within 0.3% error. 

The next benchmark compares states in Shor's integer 
factorization algorithm [ 19 1. Specifically, we consider states 
created by the modular exponentiation sub-circuit that rep- 
resent all possible combinations of x and f(x,N) = (fmodN, 
where N is the integer to be factored 1191 (see Figure [8}. 
Each of the 0(2") paths to a non-0 terminal represents a 
binary value for x and f(x,N). Thus, this benchmark tests 
performance with exponentially-growing QuIDDs. 

Tables [TJi-d show the results of the inner product and 
GPRC for this benchmark. Each N is an integer whose 
two non-trivial factors are primeQ a is set to N — 2 since 
it may be chosen randomly from the range [2. JV — 2] . In the 
case of Table Hk states |\|/) and |(p) are equal up to global 
phase. The node counts for both states are equal as pre- 
dicted by Lemma |2] Interestingly, both algorithms exhibit 
nearly the same performance. Tables [Tp.QJ andQJ contain 
results for the cases in which Hadamard gates are applied to 
the first, middle, and last qubits, respectively, of |(p). The 
results show that early termination in GPRC can enhance 
performance by factors of roughly 1.5x to lOx. 



3 Such integers are likely to be the ones input to Shor's algorithm since 
they are the foundation of modern public key cryptography 1191 . 
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Figure 6: (a) Runtime results and regressions for the inner product and GPRC on checking global-phase equivalence of states 
generated by a Grover iteration, (b) Size in node count and regression of the QuIDD state vector. 
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Figure 7: (a) Runtime results and regressions for the matrix product and GPRC on checking global-phase equivalence of the 
Grover iteration operator, (b) Size in node count and regression of the QuIDD representation of the operator. 



In almost every case, both algorithms represent far less 
than 1% of the total runtime. Thus, checking for global- 
phase equivalence among QuIDD states appears to be an 
easily achievable task once the representations are created. 
An interesting side note is that some modular exponenti- 
ation QuIDD states with more qubits can have more ex- 
ploitable structure than those with fewer qubits. For in- 
stance, the N = 387929 (19 qubits) QuIDD has fewer than 
half the nodes of the N = 1 63507 ( 1 8 qubits) QuIDD. 

Table [2] contains results for the matrix product and 
GPRC algorithm checking the inverse Quantum Fourier 
Transform (QFT) operator. The inverse QFT is a key 
operator in Shor's algorithm 1191 , and it has been previ- 
ously shown that its rc-qubit QuIDD representation grows as 
0(2 ) 1241 . In this case, the asymptotic differences in the 
matrix product and GPRC are very noticeable. Also, the 
memory usage indicates that the matrix product may need 
asymptotically more intermediate memory despite operat- 
ing on QuIDDs with the same number of nodes as GPRC. 

4 Checking Equivalence up to Relative Phase 

The relative-phase checking problem can also be solved in 
many ways. The first three algorithms are adapted from lin- 
ear algebra to QuIDDs, while the last two exploit DD prop- 
erties directly, offering asymptotic improvements. 



No. of 
Qubits 


Matrix Product 


GPRC 


Time (s) 


Mem (MB) 


Time (s) 


Mem (MB) 


5 


2.53 


1.41 


0.064 


0.25 


6 


22.55 


6.90 


0.24 


0.66 


7 


271.62 


46.14 


0.98 


2.03 


8 


3637.14 


306.69 


4.97 


7.02 


9 


22717 


1800.42 


17.19 


26.48 


10 




>2CB 


75.38 


102.4 


11 




>2CB 


401.34 


403.9 



Table 2: Performance results for the matrix product 
and GPRC algorithms on checking global-phase equiv- 
alence of the QFT operator used in Shor's factoring al- 
gorithm. > 2GB indicates that a memory usage cutoff 
of 2GB was exceeded. 

4.1 Modulus and Inner Product 

Consider two state vectors |\|/) and |q>) that are equal up to 
relative phase and have complex- valued elements Wj and v^, 
respectively, where j,k = 0, 1, . .. ,N— 1. Computing |(p') = 



'i=0 I "J 



v;||y)and|y)=£toKIW 



T N-i 
' "4=0 



e Mt v k \ \k) sets 

each phase factor to a 1, allowing the inner product to be ap- 
plied as in Subsection l3.ll The complex modulus operations 
are computed as C = Apply (A, | • | ) and D = Apply (B, | • | ) 
with runtime and memory complexity 0(\A\ + \B\), which 
is dominated by the 0(|A||B|) inner product complexity. 

4.2 Modulus and Matrix Product 



For operator equivalence up to relative phase, two cases are 
considered, namely the diagonal relative-phase matrix ap- 
pearing on the left or right side of one of the operators. 



No. of 
Qubits 


N 


Creation 
Time (s) 


No. of 
Nodes |\|f) 


No. of 
Nodes |<p) 


Inner Product 
Runtime (s) 


GPRC 
Runtime (s) 


12 


4031 


11.9 


9391 


9391 


0.30 


0.26 


13 


6973 


24.8 


10680 


10680 


0.34 


0.28 


14 


12127 


55.1 


18236 


18236 


0.54 


0.46 


15 


19093 


128.3 


12766 


12766 


0.41 


0.32 


16 


50501 


934.1 


51326 


51326 


1.7 


1.6 


17 


69707 


1969 


26417 


26417 


0.87 


0.78 


18 


163507 


12788 


458064 


458064 


19.6 


19.6 


19 


387929 


93547 


182579 


182579 


6.62 


6.02 



(a) 



(c) 



No. of 
Qubits 


N 


Creation 
Time (s) 


No. of 
Nodes |v|/> 


No. of 
Nodes |<p) 


Inner Product 
Runtime (s) 


GPRC 
Runtime (s) 


12 


4031 


11.9 


9391 


11773 


0.27 


0.076 


13 


6973 


24.8 


10680 


16431 


0.43 


0.14 


14 


12127 


55.1 


18236 


29584 


0.65 


0.22 


15 


19093 


128.3 


12766 


19207 


0.56 


0.20 


16 


50501 


934.1 


51326 


71062 


1.76 


0.84 


17 


69707 


1969 


26417 


46942 


1.24 


0.55 


18 


163507 


12788 


458064 


653048 


31.7 


26.1 


19 


387929 


93547 


182579 


312626 


9.33 


6.44 



(d) 



No. of 


Inner Product 


GPRC 


Nodes |<p) 


Runtime (s) 


Runtime (s) 


10969 


0.27 


0.036 


11649 


0.31 


0.036 


19978 


0.54 


0.06 


13446 


0.41 


0.036 


55447 


1.53 


0.2 


27797 


0.78 


0.084 


521725 


19.0 


9.18 


194964 


6.44 


4.40 


(b) 


No. of 


Inner Product 


GPRC 


Nodes |<p) 


Runtime (s) 


Runtime (s) 


14092 


0.21 


0.088 


16431 


0.27 


0.084 


29584 


0.53 


0.13 


19207 


0.50 


0.084 


74919 


1.51 


0.66 


46942 


1.13 


0.25 


629533 


29.6 


23.7 


312626 


13.0 


8.62 



Table 1 : Performance results for the inner product and GPRC algorithms on checking global-phase equivalence of modular 
exponentiation states. In (a), = |(p) up to global phase. In (b), (c), and (d), Hadamard gates are applied to the first, middle, 
and last qubits, respectively, of |(p) so that |\|/) 7^ |(p) up to global phase. 



Consider two operators U and V with elements Uj j and 
V/jfc, respectively, where j,k = 0,...N—1. The two cases in 
which the relative-phase factors appear on either side of V 
are described as Uj^ = e®'Vj ^ (left side) and Ujfi = e'^ k Vjj c 
(right side). In either case the the matrix product check dis- 
cussed in Subsection l3.2l mav be extended by computing the 
complex modulus without increasing the overall complex- 
ity. Note that neither this algorithm nor the modulus and 
inner product algorithm calculate the relative-phase factors. 

4.3 Element-wise Division 

Given the states discussed in Subsection 14. II wj. = e'^v^, 
the operation /v for each j = k is a relative-phase factor, 
e' 9 ' . The condition Iwj/vjl = 1 is used to check if each 
division yields a relative phase. If this condition is satisfied 
for all divisions, the states are equal up to relative phase. 

The QuIDD implementation for states is simply C = 
Apply(A,B, /), where Apply is augmented to avoid divi- 
sion by and instead return 1 when two terminal values be- 
ing compared equal and return otherwise. Apply can be 
further augmented to terminate early when |wy/v;| 7^ 1. C is 
a QuIDD vector containing the relative-phase factors. If C 
contains a terminal value of 0, then A and B do not differ by 
relative phase. Since a call to Apply implements this algo- 
rithm, the runtime and memory complexity are 0(|A||B|). 

Element-wise division for operators is more compli- 
cated. For QuIDD operators U and V, W = Apply(f/,V,/) 
is a QuIDD matrix with the relative-phase factor e 1 along 
row j in the case of phases appearing on the left side and 
along column j in the case of phases appearing on the right 
side. In the first case, all rows of W are identical, meaning 
that the support of W does not contain any row variables. 
Similarly, in the second case the support of W does not con- 
tain any column variables. A complication arises when 
values appear in either operator. In such cases, the support 
of W may contain both variable types, but the operators may 
in fact be equal up to relative phase. Figure[9]presents an al- 
gorithm based on Apply which accounts for these special 



cases by using a sentinel value of 2 to mark valid entries 
that do not affect relative-phase equivalence^ These entries 
are recursively ignored by skipping either row or column 
variables with sentinel children (5 specifies row or column 
variables), which effectively fills copies of neighboring row 
or column phase values in their place in W. The algorithm 
must be run twice, once for each variable type. The size of 
W is 0(|t/||y|) since it is created with a variant of Apply. 

4.4 Non-0 Terminal Merge 

A necessary condition for relative-phase equivalence is that 
zero- valued elements of each state vector appear in the same 
locations, as expressed by the following lemma. 

Lemma 3 A necessary but not sufficient condition for two 
states |(p) = lI^^QVj \j) and \\Sf) = I^^w^ \k) to be equal 
up to relative phase is that Vvj = = 0, j = k. 

Proof. If = |(p) up to relative phase, |\]/) = 
E^TrjV^Vfc \k). Since e ie * / for any 6, if any w k = 0, then 
v.- = must also be true where j = k. A counter-example 
proving insufficiency is |\|/) = (0, 1/V3, 1 / \/3, 1 / V3) T and 
|cp) = (0,1/2, 1/^2, l/2) r . □ 
QuIDD canonicity may now be exploited. Let A and 
B be the QuIDD representations of the states |\|/) and |(p), 
respectively. First compute C = Apply(A, [~| • |]) and D = 
Apply (6, [~| ■ |]), which converts every non-zero terminal 
value of A and B into a 1. Since C and D have only two 
terminal values, and 1, checking if C = D satisfies Lemma 
[3] Canonicity ensures this check requires 0(1) time and 
memory. The overall runtime and memory complexity of 
this algorithm is 0(\A\ + due to the unary Apply opera- 
tions. This algorithm also applies to operators since Lemma 
[3] also applies to uj j = e'®Jvj j. (phases on the left) and 
ll i,k = g' fl ' v j.k (phases o n the right) for operators U and V. 

4 Any sentinel value larger than 1 may be used since such values do not 
appear in the context of quantum circuits. 



RP_DIV(A,B,S) { 

if (A == New.Terminal(0)} { 
if (B ! = New-Terminal (0)) 
return Ne\V-Terminal(0); 
return N ewSerminal{2)\ 

} 

if {Is_Constant{A) and IsjConstant(B)) { 
nrp — Va/«e(A) /Value(B); 
if (sqrt(real(nr/?) *real(;7rp) + 
imag(nrp) *imag(nrp)) !— 1) 
return New_Terminal(Q); 
return New .Terminal {nrp) ; 

} 

if (TableM)okup(R,RP-DIV,A,B,S)) return «; 
v = Top.Var(A,B); 
T = RP_DIV(A V ,B V ,S); 
£ = KP_DIV(A V ;,/V,.S); 
if ((7" — — NewJTerminal(0)) or 
(£ == New.Terminal(Q))) 

return New_Terminal(0); 
if ((!-!= E) and (Type(v) == S)) { 
if [Is.Constant(T) and Va/w^r) == 2) 

return £; 

if (Is-Constant(E) and Value(E) == 2) 

return T; 
return New ^Terminal (0) ; 



2) 



} 



if {IsJConstantlJ) and Value(T) 

T — New -Terminal (I); 
if (1 s. Const ant (E) and Value(E) == 2) 

£ — NewJTerminal(l); 
R = ITE(vT,£); 
Table Jnsert(R, RP_DTV,A,B, S) ; 
return ft; 



Figure 9: Element- wise division algorithm. 

4.5 Modulus and DD Compare 

A variant of the algorithm presented in Subsection 14.11 
which also exploits canonicity, provides an asymptotic im- 
provement for checking a necessary and sufficient condi- 
tion of relative-phase equivalence of states and operators. 
As in Subsection 14.11 compute C = Apply(A, | • |) and D = 
Apply (B, | • | ) . If A and B are equal up to relative phase, then 
C = D since each phase factor becomes a 1 . This check re- 
quires 0(1) time and memory due to canonicity. Thus, the 
runtime and memory complexity is dominated by the unary 
Apply operations, giving 0(\A\ + \B\). 

4.6 Empirical Results for Relative-Phase 
Equivalence Algorithms 

The first benchmark for the relative-phase equivalence 
checking algorithms creates a remote EPR pair, which 
is an EPR pair between the first and last qubits, via 
nearest-neighbor interactions (6). The circuit is shown 
in Figure \W\ Specifically, it transforms the initial state 
|00...0) into (1/V2)(|00...0) + |10...1)). The circuit 
size is varied, and the final state is compared to the state 
(e a345l '/V5) 1 00 . . . 0) + ( e a457 VV2) 1 10 ... 1) . 

The results in Figure [Till show that all algorithms run 
quickly. The inner product is the slowest, yet it runs in 
approximately 0.2 seconds at 1000 qubits, a small frac- 
tion of the 7.6 seconds required to create the QuIDD state 
vectors. Regressions of the runtime and memory data re- 
veal linear complexity for all algorithms to within 1% er- 
ror. This is not unexpected since the QuIDD representations 
of the states grow linearly with the number of qubits (see 
Figure [TTb). and the complex modulus reduces the number 
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Figure 10: Remote EPR-pair creation between the first 
and last qubits via nearest-neighbor interactions. 
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Figure 12: A quantum-circuit realization of a Hamilto- 
nian consisting of Pauli operators. 

of different terminals prior to computing the inner product. 
These results illustrate that in practice, the inner product and 
element-wise division algorithms can perform better than 
their worst-case complexity. Element-wise division should 
be preferred when QuIDD states are compact since unlike 
the other algorithms, it computes the relative-phase factors. 

The Hamiltonian simulation circuit shown in Figure [72l 
is taken from 1141 Figure 4.19, p. 210]. When its one- 
qubit gate (boxed) varies with At, it produces a variety of 
diagonal operators, all of which are equivalent up to rela- 
tive phase. Empirical results for such equivalence checking 
are shown in Figure [J_3] As before, the matrix product and 
element-wise division algorithms perform better than their 
worst-case bounds, indicating that element-wise division is 
the best choice for compact QuIDDs. 

5 Conclusions 

Although DD properties like canonicity enable exact equiv- 
alence checking in 0(1) time, we have shown that such 
properties may be exploited to develop efficient algorithms 
for the difficult problem of equivalence checking up to 
global and relative phase. In particular, the global-phase 
recursive check and element-wise division algorithms ef- 
ficiently determine equivalence of states and operators up 
to global and relative phase, and compute the phases. In 
practice, they outperform QuIDD matrix and inner prod- 
ucts, which do not compute relative-phase factors. Other 
QuIDD algorithms presented in this work, such as the node- 
count check, non-0 terminal merge, and modulus and DD 
compare, exploit other DD properties to provide even faster 
checks but only satisfy necessary equivalence conditions. 
Thus, they should be used to aid the more robust algorithms. 
A summary of the theoretical results is provided in Table[3] 
The algorithms presented here enable QuIDDs and other 
DD datastructures to be used in synthesis and verification 
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Figure 11: (a) Runtime results and (b) size in nodes plotted with regressions for inner product, element-wise division, modulus 
and DD compare, and non-0 terminal merge checking relative-phase equivalence of the remote EPR pair circuit. 
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Figure 13: (a) Runtime results and (b) size in nodes plotted with regressions for matrix product, element- wise division, modulus 
and DD compare, and non-0 terminal merge checking relative-phase equivalence of the Hamiltonian At circuit. 
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Table 3: Key properties of the QuIDD-based phase- 
equivalence checking algorithms. 

of quantum circuits. A fair amount of work has been done 
on optimal synthesis for small quantum circuits as well 
as heuristics for larger circuits via circuit transformations 
1 1 5 II 171 . Equivalence checking in particular plays a key role 
in some of these techniques since it is often necessary to ver- 
ify the correctness of the transformations. Future work will 
determine how these equivalence checking algorithms may 
be used as primitives to enhance such heuristics. 
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